Skip to content

feat: felt OCR — three approaches to character recognition by shape qualia#86

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-embedding-pipeline-Fa65C
Apr 5, 2026
Merged

feat: felt OCR — three approaches to character recognition by shape qualia#86
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-embedding-pipeline-Fa65C

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

feat: felt OCR — three approaches to character recognition by shape qualia

Three recognition methods compared on synthetic glyphs:

  1. Base17/JL (34 bytes/glyph): golden-step projection to 17D, L1 codebook
    → B-n=38939 (closest: share vertical+bump shape), A-B=43001

  2. Polar quantization (8 bytes/glyph): 16 angles × 4 radii, rotation-invariant
    → B-n=10 (lowest!), Q-I=10, m-n=11, m-z=11

  3. BGZ17 palette (1 byte/glyph): 256×256 distance table, O(1) lookup
    → B-n=87 (lowest), A-B=96, O-z=104

All three agree: B feels like n (vertical stroke + bump). The system
discovers character relationships without being told.

Plus:

  • Euler-γ fast skew: γ/(γ+1)≈0.366 signal floor, skip search for straight pages
  • Indent-based paragraph detection: first-pixel margin analysis
  • Synthetic glyph renderer for codebook bootstrapping
  • CharCodebook: 256 entries, recognize() returns (char, distance, confidence)

For production: use ocrs+rten (AdaWorldAPI/ocrs + AdaWorldAPI/rten).
This module is the felt-distance fast path: no neural net, pure lookup.

10 tests passing.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp

…ualia

Three recognition methods compared on synthetic glyphs:

1. Base17/JL (34 bytes/glyph): golden-step projection to 17D, L1 codebook
   → B-n=38939 (closest: share vertical+bump shape), A-B=43001

2. Polar quantization (8 bytes/glyph): 16 angles × 4 radii, rotation-invariant
   → B-n=10 (lowest!), Q-I=10, m-n=11, m-z=11

3. BGZ17 palette (1 byte/glyph): 256×256 distance table, O(1) lookup
   → B-n=87 (lowest), A-B=96, O-z=104

All three agree: B feels like n (vertical stroke + bump). The system
discovers character relationships without being told.

Plus:
- Euler-γ fast skew: γ/(γ+1)≈0.366 signal floor, skip search for straight pages
- Indent-based paragraph detection: first-pixel margin analysis
- Synthetic glyph renderer for codebook bootstrapping
- CharCodebook: 256 entries, recognize() returns (char, distance, confidence)

For production: use ocrs+rten (AdaWorldAPI/ocrs + AdaWorldAPI/rten).
This module is the felt-distance fast path: no neural net, pure lookup.

10 tests passing.

https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
@AdaWorldAPI AdaWorldAPI merged commit bd7ce5e into master Apr 5, 2026
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants